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Abstract 

This paper, mostly expository in nature, surveys four measures of 
distinguishability for quantum-mechanical states. This is done from the 
point of view of the cryptographer with a particular eye on applications in 
quantum cryptography. Each of the measures considered is rooted in an 
analogous classical measure of distinguishability for probability distribu- 
tions: namely, the probability of an identification error, the Kolmogorov 
distance, the Bhattacharyya coefficient, and the Shannon distinguishabil- 
ity (as defined through mutual information) . These measures have a long 
history of use in statistical pattern recognition and classical cryptography. 
We obtain several inequalities that relate the quantum distinguishability 
measures to each other, one of which may be crucial for proving the se- 
curity of quantum cryptographic key distribution. In another vein, these 
measures and their connecting inequalities are used to define a single no- 
tion of cryptographic exponential indistinguishability for two families of 
quantum states. This is a tool that may prove useful in the analysis of 
various quantum cryptographic protocols. 



1 Introduction 

The field of quantum cryptography is built around the singular idea that phys- 
ical information carriers are always quantum mechanical. When this idea is 
taken seriously, new possibilities open up within cryptography that could not 
have been dreamt of before. The most successful example of this so far has been 



quantum cryptographic key distribution. For this task, quantum mechanics sup- 
plies a method of key distribution for which the security against eavesdropping 
can be assured by physical law itself. This is significant because the legitimate 
communicators then need make no assumptions about the computational power 
of their opponent. 

Common to all quantum cryptographic problems is the way information is en- 
coded into quantum systems, namely through their quantum-mechanical states. 
For instance, a might be encoded into a system by preparing it in a state po, 
and a 1 might likewise be encoded by preparing it in a state p%. The choice of 
the particular states in the encoding will generally determine not only the ease 
of information retrieval by the legitimate users, but also the inaccessibility of 
that information to a hostile opponent. Therefore, if one wants to model and 
analyze the cryptographic security of quantum protocols, one of the most basic 
questions to be answered is the following. What does it mean for two quantum 
states to be "close" to each other or "far" apart? Giving an answer to this ques- 
tion is the subject of this paper. That is, we shall be concerned with defining 
and relating various notions of "distance" between two quantum states. 

Formally a quantum state is nothing more than a square matrix of complex num- 
bers that satisfies a certain set of supplementary properties. Because of this, any 
of the notions of distance between matrices that can be found in the mathemat- 
ical literature would do for a quick fix. However, we adhere to one overriding 
criterion for the "distance" measures considered here. The only physical means 
available with which to distinguish two quantum states is that specified by the 
general notion of a quantum-mechanical measurement. Since the outcomes of 
such a measurement are necessarily indeterministic and statistical, only mea- 
sures of "distance" that bear some relation to statistical-hypothesis testing will 
be considered. For this reason, we prefer to call the measures considered herein 
distinguishability measures rather than "distances." 

In this paper, we discuss four notions of distinguishability that are of partic- 
ular interest to cryptography: the probability of an identification error, the 
Kolmogorov distance (which turns out to be related to the standard trace- 
norm distance), the Bhattacharyya coefficient (which turns out to be related to 
Uhlmann's "transition probability"), and the Shannon distinguishability (which 
is defined in terms of the optimal mutual information obtainable about a state's 
identity). Each of these four distinguishability measures is, as advertised, a 
generalization of a distinguishability measure between two probability distribu- 
tions. 

Basing the quantum notions of distinguishability upon classical measures in 
this way has the added bonus of easily leading to various inequalities between 
the four measures. In particular, we establish a simple connection between the 
probability of error and the trace-norm distance. Moreover, we derive a very 
simple upper bound on the Shannon distinguishability as a function of the trace- 
norm distance: SD(po,pi) < ^Tr\po— pi\. (The usefulness of this particular form 
for the bound was realized while one of the authors was working on M, where 
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it is used to prove security of quantum key distribution for a general class of 
attacks.) Similarly, we can bound the quantum Shannon distinguishability by 
functions of the quantum Bhattacharrya coefficient. 

In another connection, we consider an application of these inequalities to proto- 
col design. In the design of cryptographic protocols, one often defines a family of 
protocols parameterized by a security parameter, n — where this number denotes 
the length of some string, the number of rounds, the number of photons, etc. 
Typically the design of a good protocol requires that the probability of cheating 
for each participant vanishes exponentially fast, i.e., is of the order 0(2~ n ), as 
n increases. As an example, one technique is to compare the protocol imple- 
mentation (the family of protocols) with the ideal protocol specification and to 
prove that these two become exponentially indistinguishable^ J|, |j| . 

To move this line of thought into the quantum regime, it is natural to con- 
sider two families of quantum states parameterized by n and to require that 
the distinguishability between the two families vanishes exponentially fast. A 
priori, this exponential convergence could depend upon which distinguishability 
measure is chosen — after all the quantum-mechanical measurements optimal for 
each distinguishability measure can be quite different. However, with the newly 
derived inequalities in hand, it is an easy matter to show that exponential 
indistinguishability with respect to one measure implies exponential indistin- 
guishability with respect to each of the other four measures. In other words, 
these four notions are equivalent, and it is legitimate to speak of a single, unified 
exponential indistinguishability for two families of quantum states. 

The contribution of this paper is three-fold. In the first place, even though 
some of the quantum inequalities derived here are minor extensions of classical 
inequalities that have been known for some time, many of the classical inequal- 
ities are scattered throughout the literature in fields of research fairly remote 
from the present one. Furthermore, though elements of this work can also be 
found in there is presently no paper that gives a systematic overview of 
quantum distinguishability measures from the cryptographer's point of view. In 
the second place, some of the inequalities in Section 6 are new, even within the 
classical regime. In the third place, a canonical definition for quantum expo- 
nential indistinguishability is obtained. The applications of this notion may be 
manifold within quantum cryptography. 

The structure of the paper is as follows. In the following section we review a 
small bit of standard probability theory, mainly to introduce the setting and 
notation. Section 3 discusses density matrices and measurements, showing how 
the combination of the two notions leads to a probability distribution. In Section 
4 we discuss four measures of distinguishability, first for classical probability 
distrubitions, then for quantum-mechanical states. After a short summary in 
Section 5, we discuss several inequalities, again both classically and quantum 

1 This notion is more commonly called statistical indistinguishability in the cryptographic 
literature. However, since the word "statistical" is likely to already be overused in this paper, 
we prefer "exponential." 
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mechanically. In Section 6 these inequalities are applied to proving a theorem 
about exponential indistinguishability. Section 7 discusses an application of this 
notion — in particular, we give a simple proof of a theorem in |^] that the Shannon 
distinguishability of the parity (i.e., the overall exclusive-or) of a quantum-bit- 
string decreases exponentially with the length of the string. Moreover, the range 
of applicability of the theorem is strengthened in the process. 

This paper is aimed primarily at an audience of computer scientists, at cryp- 
tographers in particular, with some small background knowledge of quantum 
mechanics. Readers needing a more systematic introduction to the requisite 
quantum theory should consult Hughes Q or Isham j7j, for instance. A very 
brief introduction can be found in the appendix of H . 



2 Probability distributions 

Let Xq be a stochastic variable over a finite set X. Then we can define po(x) = f 
Prob[X = x], so X induces a probability distribution p over X. Let p\ be 
defined likewise. Of course, J2 x exPt( x ) = 1 ^ or * = 0, 1- After relabeling the 
outcomes xi, X2, £3, ■ • ■ x m to 1, 2, 3, ... m we get: 







X=l 


x=2 


x=3 


x=m 




7T = \ 


Po(l) 


Po(2) 


Po(3) • 


■ Po(m) 


Xx 


7T1 = \ 


Pl(l) 


Pi (2) 


Pi (3) • 


p\(m) 



Here ttq and 7i"i are the a priori probabilities of the two stochastic variables; 
they sum up to 1. Throughout this paper we take ttq — 7Ti = |. (Even though 
much of our analysis could be extended to the case ir ^ n\ 7^ i, it seems not 
too relevant for the questions addressed here.) Two distributions are equivalent 
(i.e., indistinguishable) if Po{x) — p%(x) for all x G X, and they are orthogonal 
(i.e., maximally indistinguishable) if there exists no x for which both po(x) and 
Pi(x) are nonzero. 

Observe that pt{x) denotes the conditional probability that X=x given that 
T=t, written as Prob[X=a;|T=t]. So the joint probability is half that value: 



Prob[X=x A T=t] = Prob[T=t] Prob[X=a;|T=t] (1) 
= 7T tPt (x) (2) 
= \pt{x). (3) 

We define the conditional probability r t (x) := Prob[T=t\X =x], and the prob- 
ability that X—x regardless of t, that is, p(x) := Prob[A=a;]. Using Bayes' 
Theorem we get: 

r t (x) = Prob[T=t\X=x} (4) 
= Prob[T=t] Piob[X=x\T=t]/Prob[X=x\ (5) 
= \pt(x)/p{x) (6) 
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Observe that r (x) + r\{x) — 1 for all x. Using p(x) and r t (x) we can represent 
the situation also in the following way: 







X=l 


z=2 


£=3 


x=m 


X 




P(l) 


P(2) 


p(3) . 


p(m) 


X 


= \ 


ro(l) 


^o(2) 


r (3) . 


r (m) 


X! 


7T1 = 5 


n(i) 


n(2) 


n(3) . 


ri(m) 



3 Density matrices and measurements 

Recall that a quantum state is said to be a pure state if there exists some 
(fine-grained) measurement that can confirm this fact with probability 1. A 
pure states can be represented by a normalized vector \ip) in an iV-dimensional 
Hilbert space, i.e., a complex vector space with inner product. Alternatively 
it can be represented by a projection operator IV'KV'I onto the rays associated 
with those vectors. In this paper N is always taken to be finite. 

Now consider the following preparation of a quantum system: A flips a fair coin 
and, depending upon the outcome, sends one of two different pure states |^o) or 
to B. Then the "pureness" of the quantum state is "diluted" by the classical 
uncertainty about the resulting coin flip. In this case, no deterministic fine- 
grained measurement generally exists for identifying A's exact preparation, and 
the quantum state is said to be a mixed state. B's knowledge of the system — 
that is, the source from which he draws his predictions about any potential 
measurement outcomes — can now no longer be represented by a vector in a 
Hilbert space. Rather, it must be described by a density operator or density 
matrix^ formed from a statistical average of the projectors associated with A's 
possible fine-grained preparations. 

Definition 1 (see for instance ||, (?| |iof ) 

A density matrix p is an NxN matrix with unit trace that is Hermitian (i.e. 
p = p^ ) and positive semi-definite (i.e., (ip\p\4>) > for all ip S Ti). 

Example: Consider the case where A prepares either a horizontally or 
a vertically polarized photon. We can choose a basis such that |H) = (J) 
and | V) = (°) . Then A's preparation is perceived by B as the mixed state 

which is the "completely mixed state". 

2 In general, we shall be fairly lax about the designations "matrix" and "operator," inter- 
changing the two rather freely. This should cause no trouble as long as one keeps in mind 
that all operators discussed in this paper are linear. 
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Note that the same density matrix will be obtained if A prepares an 
equal mixture of left-polarized and right-polarized photons. In fact, any 
equal mixture of two orthogonal pure states will yield the same density 
matrix. 



Any source of quantum samples (that is, any imaginary A who secretly and 
randomly prepares quantum states according to some probability distribution) 
is called an ensemble. This can be viewed as the quantum counterpart of a 
stochastic variable. A density matrix completely describes B's knowledge of the 
sample. Two different ensembles with the same density matrix are indistinguish- 
able as far as B is concerned; when this is the case, there exists no measurement 
that can allow B a decision between the ensembles with probability of success 
better than chance. 

The fact that a density matrix describes B's a priori knowledge implies that 
additional classical information can change that density matrix. This is so, 
even when no measurement is performed and the quantum system remains un- 
touched. Two typical cases of this are: (1) when A reveals to B information 
about the the outcome of her coin toss, or (2) when A and B share quantum 
entanglement (for example Einstein-Podolsky- Rosen, or EPR, particles), and 
A sends the results of some measurements she performs on her system to B. 
Observe that, consequently, a density matrix is subjective in the sense that it 
depends on what B knows. 



Example (continued): (1) Suppose that, after A has sent an equal 
mixture of |H) and |V), she reveals to B that for that particular sample she 
prepared |V). Then B's density matrix changes, as far as he is concerned, 
from 

' V2 ° ) to ( ° ° V (8) 
1/2 J \ 1 J W 

(2) An identical change happens in the following situation: A pre- 
pares two EPR-correlated photons in a combined pure state 

|*-) = -L(|H>|V>-|V>|H>) , (9) 

known as the singlet state. Following that, she sends one of the photons to 
B. As far as B is concerned, his photon's polarization will be described by 
the completely mixed state. On the other hand, if A and B measure both 
photons with respect to the same polarization (vertical, eliptical, etc.), we 
can predict from the overall state that their measurement outcomes will 
be anti-correlated. So if, upon making a measurement, A finds that her 
particle is horizontally polarized (i.e., |H}) and she tells this to B, then 
B's density matrix will change according to (0). 



As an aside, it is worthwhile to note that physicists sometimes disagree about 
whether the density matrix should be regarded as the state of a system or 
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not. This, to some extent, can depend upon one's interpretation of quantum 
mechanics. Consider, for instance, the situation where B has not yet received 
the additional classical information to be sent by A. What is the state of his 
system? A pragmatist might answer that the state is simply described by B's 
density matrix. Whereas a realist might argue that the state is really something 
different, namely one of the pure states that go together to form that density 
matrix: B is merely ignorant of the "actual" state. For discussion of this topic 
we refer the reader to |@, |ll| . Here we leave this deep question unanswered and 
adhere to the pragmatic approach, which, in any case, is more relevant from an 
information-theoretical point of view. 

Now let us describe how to compute the probability of a certain measurement 
result from the density matrix. Mathematically speaking, a density matrix p 
can be regarded as an object to which we can apply another operator E x to 
obtain a probability. In particular, taking the trace of the product of the two 
matrices yields the probability that the measurement result is x given that the 
state was p, i.e., Prob[result=x|state= J o] = Tr(pE x ). Here the x serves as a 
label, connecting the operator E x and the outcome x, but otherwise has no 
specific physical meaning. (This formula may help the reader understand the 
designation "density operator" : it is required in order to obtain a probability 
density function for the possible measurement outcomes.) 

Most generally, a quantum-mechanical measurement is described formally by a 
collection (ordered set) of operators, one for each outcome of the measurement. 

Definition 2 (see @) 

Let £ = (Ei, . . . ,E m ) be a collection (ordered set) of operators such that (1) 
all the E x are positive semi-definite operators, and (2) ^ X E X — 1A, where Id 
is the identity operator. Such a collection specifies a Positive Operator- Valued 
Measure (POVM ) and corresponds to the most general type of measurement than 
can be performed on a quantum system. 

Applying a POVM to a system whose state is described by a density matrix p 
results in a probability distribution according to: 

Prob[result = a; | state = p] = Tr(pE x ) (10) 

where x ranges from 1 to m. 

As an alternative for the designation POVM, one sometimes sees the term "Prob- 
ability Operator Measure" used in the literature. It is a postulate of quantum 
mechanics that any physically realizable measurement can be described by a 
POVM. Moreover, for every POVM, there is in principle a physical procedure 
with which to carry out the associated measurement. Therefore, we can denote 
the set of all possible measurements, or equivalently the set of all povms, as M. 

Warning: It should be noted that the scheme of measurements defined here 
is the most general that can be contemplated within quantum mechanics. This 
is a convention that has gained wide usage within the physics community only 
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relatively recently (within the last 15 years or so). Indeed, almost all older 
textbooks on quantum mechanics describe a more restrictive notion of measure- 
ment. In the usual approach, as developed by von Neumann, measurements are 
taken be in one-to-one correspondence with the set of all Hermitian operators 
on the given Hilbert space. The eigenvalues of these operators correspond to 
the possible measurement results. The framework of POVMs described above 
can be fit within the older von Neumann picture if one is willing to take into 
account a more detailed picture of the measurement process, including all ancil- 
lary devices used along the way. The ultimate equivalence of these two pictures 
is captured by a formal result known as Neumark's Theorem flQfl . 

A Projection Valued Measurement (pvm) — another name for the von Neu- 
mann measurements just described — is a special case of a POVM: it is given 
by adding the requirement that E x E y — 8{x, y)E x (with S(x, y) — l if x—y and 
otherwise — i.e., the Kronecker-delta). With this requirement, the operators E x 
are necessarily projection operators, and so can be thought of as the eigenpro- 
jectors of an Hermitian operator. One consequence of this is that the number of 
outcomes in a PVM can never exceed the dimensionality of the Hilbert space. 
General POVMs need not be restricted in this way at all; moreover the E x need 
not even commute. 



Example: Measuring whether a photon is polarized according to angle 
a or to a + n/2 is done by the POVM 



: cs \ I s — cs 
•s s 2 J ' I — cs c 2 

where c = cos a and s = sin a. This is a pvm. When applied to a photon 
known to be in state |H), for instance, this results in the probability 
distribution (c 2 ,s 2 ), using equation (|Io|). 

An example of a POVM which is not a pvm is the symmetric three- 
outcome "trine" POVM: let 7 = cos(7r/3) and a = sin(7r/3) 

~ | 1 1 2 (7 2 7M 2 [ 7 2 -7°" 
3 I / ' 3 I 7 a a 2 i ' 3 I - 7 <7 a 2 



which simplifies to 



1 

6 v " 2 / \ 6 v " 2 



Applying this POVM to the state | V) results in the probability distribution 
(0, |, |), again according to (Jic|) . 



There are two advantages to using the formalism of POVMs over that of PVMs. 
First, it provides a compact formalism for describing measurements that the 
PVM formalism has to stretch to obtain — by considering ancillary systems, 
extra time evolutions, etc., in the measurement process. Second, and most 
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importantly, there are some situations that call for all these extra steps to obtain 
an optimal measurement. A simple example is that of having to distinguish 
between three possible states for a system with a two-dimensional Hilbert space: 
the optimal POVM will generally have three outcomes, whereas a direct von 
Neumann measurement on the system can only have two. 

4 Measures of distinguishability 

We have just seen that a measurement (a povm) applied to a density matrix 
results in a probability distribution. Suppose now we have two density matri- 
ces defined over the same Hilbert space. Then we find ourselves back in the 
(classical) situation described in the previous section: comparing two probabil- 
ity distributions over the same outcome space X. In particular, let po and p\ 
be two density matrices, and let £ — {E\, . . . , E m } denote a POVM. Let Po{£ ) 
denote the probability distribution obtained by performing the POVM £ on a 
system in state po according to equation (|l0|); let Pi{£) be defined likewise. 
Then we have: 







x=l 


x=2 


x=3 


x=m 


po(£) 




Tr(po^i) 


Tr (p E 2 ) 


Tr(PoE 3 ) . 


■ T^(poE m ) 








Ti( Pl E 2 ) 


Tr( Pl E 3 ) . 


Tr(pxE m ) 



As before, ttq and m denote the a priori probabilities and are assumed to be 
equal to \. 

This section discusses four notions of distinguishability for probability distribu- 
tions and — by way of the connection above — also density matrices. The unique 
feature in the quantum case is given by the observer's freedom to choose the 
measurement. Since, of course, one would like to choose the quantum measure- 
ment to be as useful as possible, one should optimize each distinguishability 
measure over all measurements: the values singled out by this process gives rise 
to what we call the quantum distinguishability measures. 

The reader should note that being able to distinguish between probability 
distributions — that is, between alternative statistical hypotheses — is already an 
important and well-studied problem with a vast literature. It goes under the 
name of statistical classification, discrimination or feature evaluation, and has 
had applications as far-flung as speech recognition and radar detection. For a 
general overview, consult fl^| . The problem studied here is a special case of the 
general one, in the sense that we want to distinguish between two (and only 
two) discrete probability distributions with equal a priori probabilities. 

In the following subsections each classical measure of distinguishability is dis- 
cussed first, followed by a discussion of its quantum counterpart. 
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4.1 Probability of error 



Consider the following experimental situation where B is asked to distinguish 
between two stochastic variables. A provides him with one sample, x, with equal 
probability to have been secretly chosen from either Xq or X\. B's task is to 
guess which of the two stochastic variables the sample came from, Xg or X\. 
Clearly, the average probability that B makes the right guess serves as a measure 
of distinguishability between the two probability distributions po(x) and pi(x). 

It is well known that B's optimal strategy is to look at the a posteriori probabil- 
ities: given the sample x, his best choice is the t for which rt(x) is maximal (see 
the representation at the end of Section ||). This strategy is known as Bayes' 
strategy. So the average probability of successfully identifying the distribution 
equals J^xex p( x ) ma - x i r o( x )i r i( x )} = \Y, x eX max{p (^),Pi(a;)}- Conversely, 
we can also express the probability that B fails. 

Definition 3 The probability of error between two probability distributions is 
defined by 

PE(p ,Pi) d = \ X! min {Po( x )^Pi( x )} (U) 

Two identical distributions have PE = i and two orthogonal distributions have 
PE = 0. 

Warning: PE is not a distance function: for example, when two distributions 
are close to one another, PE is not close to 0, but close to |. 

In the quantum-mechanical case, the experimental set-up is almost identical. 
A has two ensembles, one according to po, the other according to p%. She 
provides B with a quantum sample chosen from one of the two ensembles with 
equal probability. Following a measurement, B must again guess from which 
ensemble the sample was drawn: the one under po or the one under p\. 

For any fixed measurement, the Bayesian strategy of guessing the density oper- 
ator with the largest posterior probability is the optimal thing to do. However, 
now B should as well make use of his extra degree of freedom: he can choose 
the measurement he applies to his sample. He should choose the measurement 
that minimizes his probability of error. So we define: 

Definition 4 The probability of error between two density matrices po and p\ 
is defined by 

PE(po, Pl ) = min PE(p (£),pi(£)), (12) 
£eM 

where the POVM £ ranges over the set of all possible measurements M . 

(More carefully, one should use "infimum" in this definition. However — since in 
all the optimization problems we shall consider here, the optima actually can 
be obtained — there is no need for the extra rigor.) 
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The question of finding an explicit formula for the optimal povm in this defini- 
tion was first studied by Helstrom pp. 106-108]. He shows that the povm 
£* that minimises PE(po(£),Pi(£)) is actually a pvm. Knowing the optimal 
POVM, the probability of error can be expressed explicitly. The expression he 
gives is, 

PE{p , Pl ) = \ + \ A i> ( 13 ) 

Aj<0 

where the Xj denote the eigenvalues of the matrix r = po — Pi ■ 

This expression can be cleaned up a little in the following way. Consider the 
function f(x) = i(x — \x\). It vanishes when x > and is the identity function 
otherwise. Thus, with its use, we can expand the summation in Eq. ( [l3| ) to be 
over all the eigenvalues of V: 

N 

PE(p , Pl ) = I + I^/(A,) (14) 

N 

= I + lTrr-I^IA.I (15) 

3=1 

= | - |Tr|r| . (16) 
Hence we have the following proposition: 

Proposition 1 Given two arbitrary density matrices po and pi, the probability 
of error equals 

N 

PE(p , Pl ) = 1-1^^-1 = 1- iTr| Po - Pi], (17) 
where the Xj are the eigenvalues of po — p± . 

PE(po, pi) is therefore just a simple function of the distance between po and p\, 
when measured as the trace norm of their difference. (An alternative derivation 
of this can be found in |fl4| .) 

4.2 Kolmogorov distance 

Among (computational) cryptographers, another measure of distinguishability 
between probability distributions is used fairly often: the standard notions of 
exponential and computational indistinguishability p5[ fl6| , || are based on it. 

Definition 5 The Kolmogorov distance between two probability distributions 
is defined by 

K(po,Pi) = \ Y1 \P^)-Pi{x)\. (18) 
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Two identical distributions have K = 0, and two orthogonal distributions have 
K = 1. 

In some references the factor of i plays no role, and the "Kolmogorov distance" 
is defined without it. Here we have included it because we wanted K to take 
values between and 1. 

Probability of error and Kolmogorov distance are closely related. 
Proposition 2 

PE(po,pi) = \- |K(p ,Pi) (19) 

This is not very difficult to prove. The most important step is to split the sum 
over X into two disjoint sub-sums, one for which po(x) < pi(x), and one for 
which po (x) >pi(x). See fl7[] . 

In the quantum case, we must again optimise over all possible measurements. 
But here this means that we have to find the POVM that maximises the Kol- 
mogorov distance. 

Definition 6 The Kolmogorov distance between two density matrices po and 
pi is defined by 

K( Po , Pl ) d ^ ma«K(po(£),pi(£)), (20) 
where the POVM £ ranges over the set of all possible measurements M . 

The relation between probability of error and Kolmogorov distance (eq. |l9|) 
shows that the two measurements that optimise PE and K are identical: £* min- 
imises the function PE(pp(£ ),f>i(£ )) if and only if it also maximises K(po(£),pi(£)). 
See also the appendix of iQ. Combining equations ( |l7| ) and ([H]) we get: 

Proposition 3 The Kolmogorov distance between two density matrices po and 
pi equals 

N 

K(Po,Pi) = |X]I A jI = 5 Tr lA> - Pil (21) 
where the Xj are the eigenvalues of po — p± . 

Observe that Tr|po — Pi | is simply the trace-norm distance on operators (l^, [l9| . 
Hence K has the additional property of satisfying a triangle inequality. The 
trace-norm distance appears to be of unique significance within the class of all 
operator-norms because of its connection to probability of error. 
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4.3 Bhattacharyya coefficient 



Another distinguishability measure that has met widespread use — mostly be- 
cause it is sometimes easier to evaluate than the others — is the Bhattacharyya 
coefficient. See @ HJ, 0. 

Definition 7 The Bhattacharyya coefficient between two probability distribu- 
tions po and pi is defined by 

B(po,Pi) = VPo(x)pi(x) . (22) 

Two identical distributions have B = 1, and two orthogonal distributions have 
B = 0. 

Warning: B is also not a distance function: for instance, when two distributions 
are close to one another, B is not close to 0. It can however be easily related to 
a distance function by taking its arccosine. 

The Bhattacharyya coefficient's greatest appeal is its simplicity: it is a sort of 
overlap measure between the two distributions. When their overlap is zero, they 
are completely distinguishable; when their overlap is one, the distributions are 
identical and hence indistinguishable. Moreover the Bhattcharyya coefficient 
can be thought of geometrically as an inner product between po and p\ , inter- 
preted as vectors in an m-dimensional vector space. However, it does not appear 
to bear a simple relation to the probability of error in any type of statistical 
inference problem. 

In the quantum case, we define a distinguishability measure by minimising over 
all possible measurements. 

Definition 8 The Bhattacharyya coefficient between two density matrices po 
and pi is defined by 

B( PQ , Pl ) d ^ mmB(p Q (£), Pl (£)), (23) 

where the POVM £ ranges over the set of all possible measurements M . 

The following proposition provides a closed-form expression for this distin- 
guishability measure. 

Proposition 4 (Fuchs and Caves ]22|]) 

The quantum Bhattacharyya coefficient can be expressed as 

B(p Q , Pl ) = Tr (J^pWp^J , (24) 

where the square-root of a matrix p denotes any positive semi-definite matrix a 
such that a 2 = p. 
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Surprisingly, it turns out that B is equivalent to another non-measurement 
oriented notion of distinguishability. Suppose \tpo) and are pure states. 
When we think these two state vectors geometrically, a natural notion of dis- 
tinguishability is the angle between l^o) and IV'i); or anv simple function of 
this angle like the inner product or overlap. In particular, we can define 
overlap(\ij)o) , \ipi)) := KV'olV'i)! as a measure of distinguishability. The ques- 
tion is: what to do for mixed states? 



The answer was given by Uhlmann [[23| [ If po is the density matrix of a mixed 
state in the Hilbert space Hi, then we can always extend the Hilbert space 
such that po becomes a pure state in the combined Hilbert space Hi <g> Hi. 
More precisely, we can always find an extension Hi of Hi and a pure state 
\ipo) € Hi ® Hi, such that Tr2(|V , o)(V'o|) = Po- Here the operator Tr2 means 
to perform a partial-trace operation over the ancillary Hilbert space Hi- When 
this condition holds, |^o) is said to be a purification of po- Similarly, if \ipi) is 
the purification of pi, we are back to a situation with two pure states, and we 
can apply the formula above, leading to the following generalised definition. 

Definition 9 The (generalised) overlap between two density matrices is defined 
by 

overlap(p , pi) = max\{<po\tpi)\, (25) 

where the maximum is taken over all purifications \tpo) and \<pi) of pa and pi 
respectively. 

It can be demonstrated that p3], 

overlap(p ,pi) = B(p ,pi). (26) 

Despite the rather baroque appearance B(p Q , pi) takes in Eq. (pij), it is endowed 
with several very nice properties. For instance, B(po,pi) is multiplicative over 
tensor products: 

B (po <8 pi , pi ® p 3 ) = B (po , pi ) B (pi , p 3 ) . (27) 

B 's square is concave over one of its arguments; i.e., if < po. Pi < 1> Mo + Pi=l 
then 222 

(s(p,/z po + /ilPl)) > /io(s(>,Po)) +/n(B(p,Pi)) ■ (28) 

Moreover, B itself is doubly concaveP): 

B(poPo + PiPi, P0P2 + P1P3) > PqB(po,Pi) + piB(pi,p 3 ). (29) 



3 A nice renew of this theorem in terms of finite-dimensional Hilbert space methods can 
be found in |24|. 



We thank C. M. Caves for pointing this out to us. 
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4.4 Shannon distinguishability 



Now we come to the last, and maybe most important, notion of distinguisha- 
bility. Mutual information, as defined by Shannon pjj, can be used as a dis- 
tinguishability measure between probability distributions j2(| [l2|. We assume 
that the reader is familiar with the (Shannon) entropy function H, the argu- 
ment of which can be either a stochastical variable or a probability distribution. 
H-2(p) = — plogp— (1 — p)log(l — p) is the entropy of the distribution (p, 1— p). 

Consider the following elementary example. Suppose we have two boxes, each 
containing colored balls. Let t S T={0, 1} denote the identity of the boxes; and 
let us think of T as a stochastic variable. Then Prob[T = t) is just the a priori 
probability ir t of Section |[ Recall that in our case tto = n\ = ~, so H(T) = 1. 
Let X denote the stochastic variable corresponding to the color of a ball upon 
being drawn from a box, taking into account that the identity of the box is itself 
a stochastic variable. Recall that ProbLY=:z;] was written as p(x). 



Consider the same experiment as in Subsection 4.1 , in which A picks a ball from 
one of the two boxes and gives it to B. One can ask now: How much information 
does X (the color of a picked ball) convey about T (the identity of the box it 
came from)? 

Information is defined as the reduction of uncertainty, where uncertainty is 
quantified using the Shannon entropy. Consider two quantities: (1) the average 
uncertainty of B about X before he was handed a sample (or ball); and (2) 
his average uncertainty about X after he was handed a sample. This difference 
expresses the amount of information gained through the experiment. It can also 
be used as a measure of distinguishability between two distributions. When 
there is no difference between the distributions, the amount of information that 
can be gained in this way is zero. When the distributions are orthogonal, all 
the information about T can be gathered. Thus we obtain: 

average information = (average uncertainty about X) — (30) 

(average uncertainty about X given t) (31) 
= R(p(x)) - (±H(po(a:)) + iH( Pl (x))) (32) 

= E(X) -^Prob[T = t] E(X\T = t) (33) 

= K(X)-K(X\T) (34) 
= I(T;X). (35) 

This leads to the following definition: 

Definition 10 The Shannon distinguishability between two probability distri- 
butions po md Pi ^ defined by: 

SD(po,pi) d = I(T;X). (36) 
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Since the mutual information is symmetric in its two random variables, it can 
also be expanded in the other direction to look like: 

SD(p„,pi) = I(T,X) (37) 
= K(T)-R(T\X) (38) 
= 1 - ]T p(.t) H(T|X = .t) (39) 

= l-£p(aOH 2 (ro(aO) (40) 
This form will be useful for various of the proofs to come. 

In the same fashion as all the other distinguishability measures, the Shannon 
distinguishability can be applied to the quantum case. We must find the mea- 
surement that optimises it when tabulated for probability distributions obtained 
by applying a quantum measurement. 

Definition 11 The Shannon distinguishability between two density matrices 
Po and p\ is defined as 

SD{p ,p x ) d = maaSD(po(£),pi(£)), (41) 

where the POVM £ ranges over the set of all possible measurements M . 

There is an unfortunate problem for this measure of distinguishabity: calcu- 
lating the value SD(po,p±) is generally a difficult problem. Apart from a few 
special cases, no explicit formula for SD solely in terms of po and p\ is known. 
Even stronger than that: no such formula can exist in the general case |27| . This 
follows from the fact that optimising the Shannon distinguishability requires the 
solution of a transcendental equation. (See also and for a discussion of 
other aspects of SD.) 

4.5 Overview 



The material presented in the previous four subsections can be summarized in 
the following table. 





classical 
definition 


when 
po =pi 


when 
Po-Lj>i 


optimality 
criterion 


quantum 
expression 


PE 


|E min {Po(a;),pi(s)} 


1/2 





min 


i - iTr|p„ - Pl \ 


K 


| E \po(x) ~Pl(x)\ 





1 


max 


§Tr po — pi 


B 


E \/po{x)pi{x) 


1 





min 


rr / 1/2 1/2 


SD 


via I(X-T) 





1 


max 


no simple form 
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5 Inequalities 

We have seen already (Eqn. (|l9|)) that probability of error and Kolmogorov 
distance are related through the equality: 

PE(po,pi) = \- iKCpo.pi) (42) 

The other pairs of distinguishability measures are related through inequalities, 
some of which can be found in the literature [^0|, plj [l?], [l2j . 

Proposition 5 Let pq and p\ be probability distributions. The following rela- 
tions hold: 



l-B( Po ,Pi) < K(po,pi) < Vl-B(po,Pi) 2 , (43) 
l-H 2 (pE(p ,Pi)) < SD(po,pi) < l-2PE(po,pi) , (44) 



l-B(p 0! Pi) < SD(po,pi) < 1-H 2 (i- Wl-B(Po,Pi) • (45) 



Before giving the proof of this proposition, we state its quantum equivalent. 
This is the main result of the paper. 

Theorem 1 Proposition can be generalized to the quantum scenario: one 
can substitute PE, K, B and SD and use density matrices po and p\ as operands. 
Alternatively, using the quantum expressions, equations (53), fc$Q and (^3) can 
be expressed in the following, equivalent form: 

l-B( Po , Pl ) < lTr|p -pi| < Vl-*(Po,Pi) 2 , (46) 

l-H 2 (i-iTr| Po -pi|) < SD(po,/»i) < |Tr|po-pi|, (47) 

l-B(p Q ,pi) < SD(p , Pl ) < l-H 2 (i- i^/l-^AJ.Pi)) ■ (48) 

The importance of this theorem is that, while the quantum Shannon distin- 
guishability is impossible to calculate in a closed form, the inequalities provide 
a useful way to bound it. We will use these bounds in an application in the next 
section. 

Proof of proposition B 

We start by proving Eq. (j43[). To get the left-hand inequality, note: 

l-B(po,j>i) = \ I Po(x) + - 2 Vpo(x)pi(x) J (49) 

= \ J2 \Vpo@)- VpW)\ 2 (50) 
< i^boW-Pi(^)| (51) 
= K(p ,Pi) (52) 
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The inequality in the penultimate step holds for each term individually. To get 
the right-hand inequality, we simply use the Schwarz inequality: 



K(po,Pi) 



k(Y,\Po(*)-rt*)\) (53) 

i E \^M X ) - Vpi(x)\ \\/po{x) + \fpi{x) J 



(54) 

(vW x ) + Vpi( x )) (55) 



= i(2-2B(p ,p 1 ))(2 + 2B( R >,Pi)) (56) 
= l-B( Po ,Pi) 2 (57) 

In order to prove the left inequality of Eq. (Hi we observe that this is a special 
case of the Fano inequality (see for instance jpO|): 

H(T|X) < H 2 (PE(p ,pi)) + PE(po,Pi) log(#T - 1). 

where #T = 2 is the cardinality of the set T. 

For the right-hand inequality of Eq. @ we expand l(X; T) as E(T)-E(T\X) to 
obtain an inequality between SD and PE. (See also [p9|.) Recall the definitions of 
r t (x) and p(x), observing that 7*1(2;) = l — ro(x) and that 2 min{r, 1 — r} < H 2 (r) 
for all r between and 1 (see Figure 0). Hence, we obtain: 

SD(po,pi) = 1- E^)H 2 (r (.x)) (58) 

< l-^pW- 2mill {''o(4l-''o(i)} (59) 

< 1 - 2PE(p ,p 1 ). (60) 



The left-hand inequality of Eq. (|45|) is obtained in a similar way. Using the fact 
that H 2 (r) < 2yV(l — r) (see Figure |l[), we get: 

SD(p ,Pi) = 1- p( x ) MMx)) (61) 

> l-^pW^v^jM^j) (62) 

= 1-E a/poWpiW (63) 

= l-B(po,pi). (64) 
For the right-hand side of Eq. p5|), we define the function 



ff(r):=i- Wl-r 2 (65) 
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Figure 1: 2min{x, l-x} < H 2 (x) < 2(x(l - x))2 

(Formally this is proven by looking at the first and second derivatives. 



It can be verified that k(r) = 2y / r(l — r) is the inverse of g(r) when < r < |. 
Moreover, k(r) = k(l — r) = min{r, 1 — r)}. Using the fact that H2(r) = 
H2(l — r) = H2(min{r, 1 — r)}) and that H2(g(r)) is a convex function, we obtain 
by Jensen's inequality that 



SD(p 0) Pi) = 1 - MM*)) 

x£X 

= 1 - v{x) H 2 (min{r (x), 1 - r (x)} 

x£X 

= 1 - p ^ H 2(s(^( min { r o(>), 1 ~ r (x)} 

x£X 

= 1- Yp( x )^(9{k{r (x))) 

xSX 



1-H 2 (g(B(p , Pl )) 



= 1 - 



H 2 (| - | Vl - (B(P0,Pt)) 2 ) • 



(66) 
(67) 
(68) 
(69) 

(70) 

(71) 
(72) 



This concludes the proof of Proposition bio 
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The main tool in proving the quantum versions of these inequalities is in noting 
that all the bounds are appropriately monotonic in their arguments. 

Proof of theorem |T|: 

First we prove Eq. (Eq). We start with the first inequality. Let £* B denote a 
povm that optimizes B and define E* K likewise. 



l-B(p , Pl ) = l-B{p (£* B ), Pl (S B )) (73) 

< k(p (£ b ), Pi (£ b )) (74) 

< K(p (S* K ), Pl (e* K )) (75) 
= K( Po , Pl ) (76) 

The second inequality of Eq. ( |46| ) follows from 

K( Po , Pl ) = K( Po (£* K ), Pl (£* K )) (77) 



< yi-B^(^),Pi(^)J (78) 

< ^l-*(p (£ B ), Pl {£ B )) 2 (79) 



= y/l-B(po,Pi) 2 (80) 

Equations ( fl7|) and (|4^ ) are proven in an identical way. In particular, in Eq. ( ^6|) 
the functions on the extreme left, /(x) = 1 — x, and on the extreme right, 
/(x) = yl — x 2 are both monotonically decreasing. In addition, B must be 
minimized whereas K must be maximized. The same is true for Eqs. ( [i"7| ) and 
©• o 



6 Exponential indistinguishability 

As already described in the Introduction, in the solution of various cryptographic 
tasks, one often actually designs a whole family of protocols. These are param- 
eterized by a security parameter, n: a number that might denote the length of 
some string, the number of rounds, or the number of photons transmitted, for 
instance. Typically the design of a good protocol requires that the probability of 
cheating for each participant vanishes exponentially fast, i.e., it goes as 0(2~ n ). 
As an example, one technique is to compare the protocol implementation (the 
family of protocols) with the ideal protocol specification and to prove that these 
two become exponentially indistinguishable |^|, |j| . 

Definition 12 Let {Xq} = (Xq , Xq 2 \ Xq , . . .) denote a family of stochas- 
tic variables with corresponding distributions {p^\ p^\ p^\ . . .) . Let {Xi} be 
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defined similarly. Then {Xq} and {X\} are exponentially indistinguishable if 
there exists an uq and an e between and 1 such that 

^n>n a :K(p^ ) ,p^ ) )<e n 

Examples of exponentially indistinguishable stochastic-variable families can be 
constructed easily. For instance, let Xq be uniformly distributed over {0, 1}", 
the set of strings of length n. That is to say, for each x e {0, l} n , we have 
Pq 1 \x) = 2~ n . This defines the family of uniform distributions over {^o}- Let 
{Ai} be defined identically, except that ^ Tl) (0") = 0, while p^ n) (l") = 2- n+1 . 
So for {Ai}, 0™, the word with all zeroes, has zero probability; while 1™, the 
word with all ones, has double the probability it had in the uniform disttribution. 
Clearly the two families {Xq} and {Xi} are exponentially indistinguishable. 

The reader should be aware that in computational cryptography more refined 
notions of distinguishability have been defined For polynomial indistin- 
guishability, it is only required that the families converge as fast as l/n k , for 
some k > 0. Though we will not argue it formally here, it is not hard to see 
that the proof of Lemma [l] generalizes to apply to polynomially indistinguishable 
families. 

Yet another refinement is computational indistinguishability. For it, a sample is 
given to a Turing machine (or a poly-size bounded family of circuits) , and we 
look at the Kolmogorov distance of the possible outputs. After maximizing over 
all Turing Machines, we say the stochastic-variable families are computationally 
indistinguishable if the distance between them converges to zero polynomially 
fast. Computational indistinguishability has turned out to be extremely pow- 
erful for defining notions as pseudo-random number generators Jl^] and zero 
knowledge protocols § . All these notions of protocol indistinguishability have 
that in common if a distinguisher is given a sample and restricted to polynomial- 
time calculations, then he will not be able to identify the source of the sample. 

Here we shall follow the computational-cryptographic tradition in defining expo- 
nential indistinguishability via the Kolmogorov distance. However, this choice 
is in no way crucial: the next lemma shows that we could have taken any of the 
four distinguishability measures. In other words K, PE, B and SD turn out to be 
equivalent when we require exponentially fast convergence.^ 

Lemma 1 Let {Xq} and {X\\ be two families of stochastic variables that are 
exponentially indistinguishable with respect to one of the distinguishability mea- 
sures K, PE, B, SD. Then {Xq} and {Xi} are exponentially indistinguishable 
with respect to each o/K, PE, B, SD. 

Proof: The equivalence between exponential indistinguishability for PE and K 
follows from Eq. ([l9|). The other equivalences follow from Eqs. ( ff3| ) through 

5 There is a small technicality here: indistinguishable distributions have PE = | and B = 1, 
so exponential indistinguishability means convergence to those values, instead of convergence 
to 0, as is the case with K and SD. 
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(p5|). For instance, the proof that exponential indistinguishability for K implies 
exponential indistinguishability for B goes as follows. Suppose 

[3n ,e}^n>n ] : K(p^\p^)<e n . (81) 

Using the left hand side of Eq. (|43|), it follows at once that B(pg™\p^™' 1 ) > 1 — e n . 
It then follows from the fact that B^p^^p^) is bounded above by unity, that 
we obtain the desired exponential convergence. 

The other implications are proven in a similar way. As far as expressions in- 
volving H2(x) are concerned, it is sufficient to recall (see fig. |l|) that 

2min{x, 1 - x} < H 2 (x) < 2y/x(l - x) (82) 

This concludes the proof, o 

The obvious next step is to define exponential indistinguishability for density 
matrices, and to show that the choice of the distinguishability measure is im- 
material. 

Definition 13 Let {/Jq } — (Po \ Po \ Po \ ■ ■ ■) denote a family of density ma- 
trices defined over the Hilbert space 7i. Let {/Jj } be defined similarly. Then 
{p^} and {p^} are exponentially indistinguishable if there exists an n and 
an e between and 1 such that 

yn>no:K(p^\p^)<e n 

An example that makes use of this definition will be presented in the next 
section. However, first let us conclude with the quantum analogue of Lemma [l| 

Theorem 2 Let {po™^ } and {p^ } be two families of density matrices which are 
exponentially indistinguishable with respect to one of the distinguishability mea- 
sures K, PE, B, SD. Then {p^} and {p^} are exponentially indistinguishable 
with respect to each of K, PE, B, SD. 

PROOF: This follows immediately from Lemma [l| and Theorem |l|. o 

7 Applications 

Let us now look at an application of the quantum-exponential indistinguisha- 
bility idea. In particular, we look at the problem of the parity bit in quantum 
key distribution as studied in || (henceforth called BMS). Let | Vo) = Csfna) an< ^ 
= (_^°; n Q Q ), and let p and p\ be the corresponding density matrices. That 
is to say, the bits and 1 that contribute to constructing a cryptographic key 
are encoded into a physical system — a photon, say — via po and p±. Likewise, the 
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bit string z = z\Z2 ■ ■ ■ z n is represented by n different photons, the ith photon 
being in state pi. Thus the combined state for the string z is given by 

Pz = Pzi ® Pz 2 O • • • ® Pz n (83) 
where stands for the tensor product. 

Now let denote all the strings of length n with even parity (i.e., with overall 
exclusive-or equal to 0) and z[ n ^ all strings of length n with odd parity. Then 
define 

p (n) = 1/2 n-l £ pz (g4) 

for j — 0,1. In bms these two density matrices and explicitly calculated in order 
to compute their Shannon distinguishability as a function of n and a. This is 
extremely important because the parity bit appears in the proof of security jl| 
of the BB84 key exchange protocol||3l[. 

Here we compute the distinguishability between p^ and p^ in terms of Kol- 
mogorov distance and Bhattacharyya coefficient. For the special case n = 2 
we also study the inequalities obtained in Theorem |^, as an illustration of how 
tight the bounds are. Observe that, at this point in time, the problem of the 
parity bit is one of the few non-trivial (i.e., multi-dimensional Hilbert-space) 
examples for which the Shannon distinguishability, Kolmogorov distance (and 
related probability of error) and Bhattacharyya coeffecient can be computed. 
For the next few paragraphs the reader is advised to consult BMS, or to take 



Eqs. (g5j), (|97|), and (|98|) below as given. 

First let us calculate the Kolmogorov distance K (p^, p^ ) as a function of n 

and a. In bms it is shown that A^™) = h(po^ ~Px^) ^ as non-zero entries only on 
the secondary diagonal. Moreover, it is not difficult to see that all these entries 
equal c n s n , where c = cos a, s — sin a. Therefore A^™) has 2™ _1 eigenvalues 
equal to — c n s n , and 2™ _1 eigenvalues equal to +c n s n , so 

K(p ( o n \p[ n) ) - E i^i = i 2cs i" = i sin2 «r = i s r. ( 85 ) 

where S = sin 2a. Clearly, {p^} and {p^} are exponentially indistinguishable 
for all values of a ^ tt/2. (Note that bms proved exponential indistinguishability 
only for the case that a » 0.) 

Computing the Bhattacharrya coefficient between p^ and p^™' is a more elab- 
orate calculation. In BMS Eqs. (19) and (20) it is shown that, with a minor 
change of basis, p^ and p^ can be taken to be block-diagonal with 2x2 
blocks. The 2x2 blocks are of the form 

a (n,k) _ j I for even parity; (86) 

c 2fc s 2(n-fc) 



c 2(n-k) g 2k 
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and 



(n,k) 



c 2(n — k) g 2k _gn^r. 



„2k „2(n-fc) 



for odd parity, 



(87) 



where fc ranges between and n. For each < fc < [ n /2j , the blocks a^'^ and 
^(ji,n-k) j^gjjg an appearance a total of | (?) times. 

With this as a starting point, let us develop a convenient notation. If a is an 
n x n positive semi-definite matrix of the form 



Jqp 




(88) 



where a u is a p x p matrix, cr' is a g x q matrix, pq is a p x c/ matrix, and 
n = p + q, then we shall write this as cr = cr" (B a 1 . In this fashion, we have 



(n) 



©PC 

fc=l 



0,fe) 



(89) 



» 



for the appropriate 2x2 matrices P(o,k)- Similarly for p\ 
It is not difficult to see that the following three equalities hold: 



Tr(cr u ) +Tr(a') 
(<tf <r?) © («) 



From this it follows that 



which we can write in a short-hand notation as[] 

S((70,CTi) =B((7|(,^) + B(^4) 



i 



(90) 
(91) 
(92) 



(93) 



(94) 



Thus we can evaluate B (p^ , p± ) by evaluating each block individually and 
summing the results. In particular, we find that 



o I „{n,h) (n,k) 
O[(T ,<T 1 



r , , (n,n — k) (n,n — k) 
5(^0 >°1 



Summing up over all blocks of p^ we get 

L«/2J 



c 2(n-fc) s 2fc _ c 2fc s 2(n-fc) 



fc=0 



c 2(n~k) s 2k _ c 2fc g 2(n-fe) 



(95) 



(96) 



6 Note that the expressions in this short-hand version are not proper Bhattacharyya coef- 
ficients: they are not normalized properly. 
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Figure 2: Equation (^) for the parity bit with n = 2 and with a g [0,7r/4] on the 
horizontal axis. 



\c\, 



(97) 



For the case n = 2 this expression reduces to 

B(tf\pW) = \c i -s*\ = \(c*-s*)(c* + s 2 )\ 
where C = cos 2a. 

For the Shannon distinguishability in the special case n = 2, BMS Eq. (44) gives 

1 , / C 2 \ S 2 



SflU 2 \p^) = -(l + C 2 )/ 2 l 



(98) 



We are now in a position to substitute Eqs. (|85|), ( |97| ) and ( J98| ) into Eqs. (|46D, 
( p7| ) and (f48|). Observe that Eq. ( |46|) h olds automatically, in fact with equality 
on the right hand side. Equations (|47j) and (^8|) are illustrated in Figure ^ and 
|| respectively. The horizontal axis represents the angle a between \tpi) and ( Q ), 
meaning that for ir/4 (w 0.785) the states |^o) and l^i) are orthogonal. The 
fact that the bounds based on the Bhattacharyya coefficient are fairly tight can 
be explained by the fact that the function 2y/x(l — x) resembles H^x) quite 
well. 
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Figure 3: Equation Q48|) for the parity bit with n = 2 and with a £ [0, 7r/4] on the 
horizontal axis. 
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